Untitled Document

Notes on the semantics of programming languages

The semantics (the meaning) of a programming language (or better, the meaning of the programs written in that programming language) is defined by the semantics of the syntactic constructs, the semantics associated with the non-terminal symbols and the production rules. For example, the meaning of a constant definition (in a program) is the value of the constant; the meaning of the definition of a data type is normally the set of values that belong to the type; the meaning of an expression is the value of that expression; and the meaning of a statement (possibly composed statement) is the effect that its execution has on the values stored in the program variables in memory.

Certain semantic properties can be evaluated during the compilation phase; they are called static semantics. For instance, in a language with static typing, the type of the value of an expression can be evaluated by the compiler, but the value of an expression depends in general on the values of variables which will only be known during the execution phase.

The static semantics is often represented by semantic attributes associated with the non-terminals. These attributes will be evaluated by the compiler and their values will be associated with the non-terminal nodes within the syntax tree. For example, each node <expression> in the syntax tree my have an attribute, called "type", which contains the value of the type of the expression. For more details, see the book by Sebesta, Section 3.5.

Different methods can be used to define the dynamic semantics of a programming language, that is, those aspects of "meaning" that is defined in reference to the execution phase of the program. These methods can be classified into the following categories:

Informal definition: Only informal explanations are given (in natural language) about the meaning of the syntactic constructs of the language (e.g. reference manual of the language).
Operational semantics: The meaning of a construct in the language is given by defining a translation into another language (normally lower level). Normally, only the translation is defined formally, while the meaning of the lower-level language is only defined informally.
Denotational semantics: The meaning of a statement of the language is given in the form of a (mathematical) function that has as argument the state (values) of the variables before the execution of the statement, and which has as result the state of the variables after the execution of the statement. For instance, for an assignment statement, this function is the identity function for all variable values, except for the variable that is assigned to by the assignment statement - it obtains a value equal to the value of the expression. We note that the meaning of an expression is a function that has as argument the state of all variables in memory and which has as result the value of the expression (evaluated in the context of the current state of variables - on which it may depend). The semantics of a loop statement is defined recursively by applying the function representing the semantics of the looping statement an indetermined number of times.
Axiomatic semantics: The meaning of a statement is given by providing axioms about the properties of the function that represents the semantics (assuming the the meaning is given by a function, as in the case of denotational semantics). This approach is similar to the approach of defining the semantics of a procedure (independently of its code) by providing pre- and post-conditions that define properties for the input and result parameters and the values of local variables. In this context, Dijkstra introduced the concept of the weakest pre-condition: for a given statement and a given post-condition that should be satisfied after the execution of the statement, the weakest pre-condition is the weakest condition that, if satisfied before the execution of the statement, will assure that the post-condition is satisfied thereafter.

An example: The definition of the semantics of regular expressions

Informal definition: see first page of the article "Using Lex" (in the printed course notes).
Informal definition, but already much more formalized: The definition given in the course notes:
- The regular expressions represent a simple formalism for defining a (regular) formal language over a given alphabet. The regular expressions use the symbols of the alphabet and the empty word є as constants and use the language operators concatenation, union, and Kleene's * (discussed above) for constructing the defined language. The notation for the operators is not uniform; for the union, the notations "+" and "׀" are used. The different operators have different priorities, normally Kleene's operator has highest priority. The following are the syntax rules for regular expressions (as used in these course notes) together with an explanation of their semantics:
Formal definition in denotational style: Using the syntax for regular expressions given earlier and using the notation for defining semantic attributes as given in the book by Sebesta, we can give the following definition of the semantics of regular expressions. Note: If we compare the definition below with the one given at the beginning of this web page in relation with the meta-model of regular expressions, we can see that these two definitions are equivalent.

Declaration of the semantic attributes: the non-terminal RegExpr has an attribute called S (pour "semantics") of type "set of sequences of terminal symbols".
Grammar and attribute evaluation rules:
- RegExp[1] --> ( RegExpr[2] ) semantic rule: RegExp[1].S <-- RegExp[2].S
- RegExpr --> a semantic rule: RegExp.S <-- {a} (and similarly for b and c)
- RegExpr --> ε semantic rule: RegExp.S <-- {ε}
- RegExpr[1] --> RegExpr[2] RegExpr [3] semantic rule: RegExp[1].S <-- RegExp[2].S concatenate RegExp[3].S
- RegExpr[1] --> RegExpr[2] + RegExpr [3] semantic rule: RegExp[1].S <-- RegExp[2].S union RegExp[3].S
- RegExpr[1] --> RegExpr[2] * semantic rule: RegExp[1].S <-- Kleene's closure of (RegExp[2].S)